Approximating Optimal Policies for Agents with Limited Execution Resources

نویسندگان

  • Dmitri A. Dolgov
  • Edmund H. Durfee
چکیده

This paper considers the problem of composing or scheduling several (non-deterministic) behaviors so as to conform to a specified target behavior as well as satisfying constraints imposed by the environment in which the behaviors are to be performed. This problem has already been considered by several works in the literature and applied to areas such as web service composition, the composition of robot behaviors and co-ordination of distributed devices. We develop a sound and complete algorithm for determining such a composition which has a number of significant advantages over previous proposals: a) our algorithm is different from previous proposals which resort to dynamic logic or simulation relations, b) we realized an implementation in Java as opposed to other approaches for which there are no known implementations, c) our algorithm determines all possible schedulers at once, and d) we can use our framework to define a notion of approximation when the target behavior cannot be realized. Building and developing re-usable modules is one of the cornerstones of computer science. Furthermore, building on previously established infrastructure has allowed us to construct elaborate and sophisticated structures from skyscrapers and aeroplanes through to the world-wide web. Developing the components that go into these structures is only part of the problem however. Once we have them in place, we need to develop methods for piecing them together so as to achieve the desired outcome. In this paper we consider the problem of composing behaviors. This problem has already attracted some attention in the recent literature [Berardi et al., 2008; Calvanese et al., 2008; de Giacomo and Sardina, 2007; Sardina and de Giacomo, 2007; Sardina et al., 2008; Sardina and de Giacomo, 2008; Berardi et al., 2006b; 2006a] with several proposals being put forward. More precisely, we consider the problem of composing or scheduling several (non-deterministic) behaviors so as to conform to a specified (deterministic) target behavior as well as satisfying constraints imposed by the environment in which the behaviors are to be performed. These behaviors are abstractions that can represent a variety of mechanisms such as programs, robot actions, capabilities of software agents or physical devices, etc. As such, solutions to this problem have a wide field of applicability from composing web services [Berardi et al., 2008] through to co-ordinating multiple robots or software agents [de Giacomo and Sardina, 2007; Sardina and de Giacomo, 2007; Sardina et al., 2008; Sardina and de Giacomo, 2008]. The closest work to this paper is that of Sardina et al. [2008] which proposes a regression based technique to solving this problemwhere we present a progression based technique here and briefly consider the possibility of approximating the target behaviour. For example, consider an urban search and rescue setting with three types of robots. Scout robots can search for victims and report their location. Diagnosis robots can assess victims and determine whether their condition requires special transportation. If not, this robot can guide victims to safety. Rescue robots can carry immobile victims to safety. We will elaborate this example using our framework in this paper. This paper provides four main contributions that improve on previous approaches to this problem: 1) we provide a sound and complete algorithm for solving the behavior composition problem which works in the way of a forward search—this is in contrast to the proposal of Sardina et al. [2008] which can be seen as a backward search; 2) we have realized an implementation of our algorithm in Java which is the first known implementation of a solution to this problem; 3) our algorithm determines all possible schedulers for the target behavior (as can the proposals in [Berardi et al., 2008; Sardina et al., 2008]); and, 4) our approach allows the definition of approximate solutions to the behavior composition problem which can be used when the target behavior cannot be realized by the available behaviors in the given environment.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Determination of optimal and and water allocation under limited water resources using soil water balance in Ordibehesht canal of Doroodzan water district

ABSTRACT- Inadequate water supply is the major problem for agriculture in arid and semi-arid regions. Thus, effective management should be considered for water resources planning. In this research, a model was provided which is able to estimate optimal land and water allocation in the Doroodzan irrigation network. Optimal water management model was used at farm level to evaluate different defic...

متن کامل

Resource Allocation and Multiagent Policy Formulation for Resource-Limited Agents Under Uncertainty

The problem of optimal policy formulation for teams of resourcelimited agents in stochastic environments is composed of two strongly coupled subproblems: a resource allocation problem and a policy optimization problem, both of which have individually received significant amount of attention. We show how to combine the two problems into a single constrained optimization problem that yields optim...

متن کامل

Approximations in Dynamic Zero-sum Games, I

We develop a unifying approach for approximating a \limit" zero-sum game by a sequence of approximating games. We discuss both the convergence of the values and the convergence of optimal (or \almost" optimal) strategies. Moreover, based on optimal policies for the limit game, we construct policies which are almost optimal for the approximating games. We then apply the general framework to stat...

متن کامل

Smoothing sudden stops

Emerging economies are often exposed to sudden shortages of international financial resources. Yet domestic agents do not seem to take preventive measures against these sudden stops. We highlight the central role played by the limited development of ex ante (insurance) and ex post (spot) domestic financial markets in generating this collective undervaluation of international resources. We study...

متن کامل

Quantized Stationary Control Policies in Markov Decision Processes

For a large class of Markov Decision Processes, stationary (possibly randomized) policies are globally optimal. However, in Borel state and action spaces, the computation and implementation of even such stationary policies are known to be prohibitive. In addition, networked control applications require remote controllers to transmit action commands to an actuator with low information rate. Thes...

متن کامل

Approximations in Dynamic Zero-sum Games

We develop a unifying approach for approximating a “limit" zero-sum game by a sequence of approximating games. We discuss both the convergence of the values and the convergence of optimal (or “almost" optimal) strategies. Moreover, based on optimal policies for the limit game, we construct policies which are almost optimal for the approximating games. We then apply the general framework to stat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003